Speech Recognition Supported by Prosodic Information for Fixed Stress Languages

نویسندگان

  • György Szaszák
  • Klára Vicsi
چکیده

In our paper we examine the usage of prosodic features in speech recognition, with a special attention payed to agglutinating and fixed stress languages. The used prosodic features, acoustic-prosodic preprocessing, and segmentation in terms of prosodic units are presented in details. We use the expression ”prosodic unit” in order to make a difference from prosodic phrases, which are longer. We trained a HMM-based prosodic segmenter reliing on fundamental frequency and intensity of speech. The output of the prosodic segmenter is used for N-best lattice rescoring in parallel with a simplified bigram language model in a continuous speech recognizer, in order to improve speech recognition performance. Experiments for Hungarian language show a WER reduction of about 4% using a simple lattice rescoring.

منابع مشابه

Using prosody to improve automatic speech recognition

In this paper acoustic processing and modelling of the supra-segmental characteristics of speech is addressed, with the aim of incorporating advanced syntactic and semantic level processing of spoken language for speech recognition/understanding tasks. The proposed modelling approach is very similar to the one used in standard speech recognition, where basic HMM units (the most often acoustic p...

متن کامل

Automatic Annotation of Speech Corpora for Prosodic Prominence

This paper presents a study on the automatic detection of prosodic prominence in continuous speech, with particular reference to American English, but with good prospects of application to other languages. Perceptual prosodic prominence is supported by two different prosodic features: pitch accent and stress. Pitch accent is acoustically connected with fundamental frequency (F0) movements and o...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Word segmentation in Persian continuous speech using F0 contour

Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...

متن کامل

An evaluation of keyword spotting performance utilizing false alarm rejection based on prosodic information

In this paper, we describe our effort in developing new method of false alarm rejection for keyword spotting type of speech recognition system. This false alarm rejection uses prosodic similarities, and works as posterior rescore basis. In keyword spotting, there is always false alarm problem. Here, we propose a technique to reject those false alarms using prosodic features. In Japanese, prosod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007